Resampling methods

Bootstrapping the sample median

The following program generates an artificial sample of size 50 from the standard normal and bootstraps the distribution of the sample mean and median. Note that the theoretical standard error of the sample median for the standard normal of size n is Ö (0.5p / n) which is approximately 0.177 for n = 50. By comparison, the standard error of the sample mean is Ö(1 / n) » 0.141.

 
' bootstrap sample mean and median ( 11/2/99 h)
' series proc version (slower than matrix version)
' last revised 3/07/2007
 
 
' set size of sample
!n = 50
 
' create workfile
wfcreate bootmed1 u 1 !n
 
' set seed of random number generator
rndseed 456
 
' generate pseudo-draws from a standard normal
series x = nrnd
 
' set number of bootstrap replications
!reps = 10000
 
' store means and medians in matrix
matrix(!reps,2) out
 
' bootstrap loop
tic
for !i = 1 to !reps
   ' generate bootstrap sample
   x.resample x_b
   ' store bootstrap median
   out(!i,1) = @median(x_b)
   ' store bootstrap mean
   out(!i,2) = @mean(x_b)
next
scalar elp = @toc
 
' display descriptive statistics
show out.stats
 
' theoretical standard error of median from normal
' se(med) = 1 / ( 2*m*(obs)^0.5 )
' where m is the median value
' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)
!pi = @acos(-1)
scalar se_med = @sqrt(0.5*!pi/!n)
scalar se_mea = @sqrt(1/!n)
 
' convert bootstrap sample into series
expand 1 !reps
smpl 1 !reps
 
series out_medi
series out_mean
group gout out_medi out_mean
mtos(out, gout)
 
' display results in table
table tab
setcolwidth(tab,1,20)
 
tab(1,2) = "median"
tab(1,3) = "mean"
 
setline(tab,2)
 
tab(3,1) = "bootstrap mean:"
tab(3,2) = @mean(out_medi)
tab(3,3) = @mean(out_mean)
 
tab(4,1) = "bootstrap s.d.:"
tab(4,2) = @stdev(out_medi)
tab(4,3) = @stdev(out_mean)
 
tab(5,1) = "theoretical s.d.:"
tab(5,2) = se_med
tab(5,3) = se_mea
 
setline(tab,6)
 
tab(7,1) = "sample size = "
tab(7,2) = @str(!n)
 
tab(8,1) = "bootstrap replications = " 
tab(8,2) = @str(!reps)
 
tab(9,1) = "elapsed time = " 
tab(9,2) = @str(elp) + " seconds"
show tab
 

The following is the matrix function version of the above program that produces identical results, but takes less time to process.

 
' bootstrap sample mean and median
' matrix function version (faster than series version)
'last revised 3/07/2007
 
' create workfile
wfcreate bootmedian2 u 1 1
 
' set seed of random number generator
rndseed 456
 
' set size of sample
!n = 50
 
' generate pseudo-draws from a standard normal
matrix(!n,1) x
nrnd(x)
 
' set number of bootstrap replications
!reps = 10000
 
' store means and medians in vectors
vector(!reps,1) out_medi
vector(!reps,1) out_mean
 
' declare matrix to hold bootstrap sample
matrix x_b
 
' bootstrap loop
' reset timer
tic
for !i = 1 to !reps
   ' generate bootstrap sample
   x_b = @resample(x)
   ' store bootstrap median
   out_medi(!i) = @median(x_b)
   ' store bootstrap mean
   out_mean(!i) = @mean(x_b)
next
' store elapsed time in seconds
scalar elp = @toc
 
' theoretical standard error of median from normal
' se(med) = 1 / ( 2*m*(obs)^0.5 )
' where m is the median value
' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)
!pi = @acos(-1)
scalar se_med = @sqrt(0.5*!pi/!n)
scalar se_mea = @sqrt(1/!n)
 
' display results in table
table tab
setcolwidth(tab,1,20)
 
tab(1,2) = "median"
tab(1,3) = "mean"
 
setline(tab,2)
 
tab(3,1) = "bootstrap mean:"
tab(3,2) = @mean(out_medi)
tab(3,3) = @mean(out_mean)
 
tab(4,1) = "bootstrap s.d.:"
tab(4,2) = @stdev(out_medi)
tab(4,3) = @stdev(out_mean)
 
tab(5,1) = "theoretical s.d.:"
tab(5,2) = se_med
tab(5,3) = se_mea
 
setline(tab,6)
 
tab(7,1) = "sample size = "
tab(7,2) = @str(!n)
 
tab(8,1) = "bootstrap replications = " 
tab(8,2) = @str(!reps)
 
tab(9,1) = "elapsed time = " 
tab(9,2) = @str(elp) + " seconds"
show tab
 

^Top

Two-sample permutation test

The following program replicates the permutation test in Johnston-DiNardo (1997, 11.3). The hypothesis under test is whether there is a significant difference in the sample of employment changes between two states. The program constructs a nonparameteric 95% confidence interval of the difference in means using 1000 permutation samples.

 
' two-sample permutation test ( 11/3/99 )
' (Johnston-DiNardo 11.2)
' matrix permutation function (fast)
' checked 3/6/2007
 
' create workfile
wfcreate cardkrug u 1 1
 
' input data in matrix (first 33 rows from N.J.)
matrix(40) data
data.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5
 
' extract submatrices for each sample
matrix sub0 = @subextract(data,1,1,33,1)
matrix sub1 = @subextract(data,34,1)
 
' compute difference in means for actual sample
scalar mdiff = @mean(sub0) - @mean(sub1)
 
' set number of replications
!reps = 1000
 
' set seed of random number generator
rndseed 123456
 
' declare storage matrices
matrix pdata                             ' permuted data
vector(!reps) pdiff                      ' difference in means from pdata
 
' permutation loop
for !i=1 to !reps
   ' permute data
   pdata = @permute(data)
   ' extract submatrices from permuted data
   sub0 = @subextract(pdata,1,1,33,1)
   sub1 = @subextract(pdata,34,1)
   ' store difference in means
   pdiff(!i) = @mean(sub0) - @mean(sub1) 
next
 
' 95% interval (percentile method)
scalar lower = @quantile(pdiff,0.025) 
scalar upper = @quantile(pdiff,0.975)
 
' display output in table
table out
setcolwidth(out,1,30)
out(1,1) = "Sample difference in means:"
out(1,2) = mdiff
 
out(2,1) = "Permutation confidence interval:"
out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]" 
 
out(3,1) = "Number of permutations:"
out(3,2) = @str(!reps)
show out
 

For reference, we also provide a program that uses the group resample procedure. This program also computes the t-statistic for testing the difference of means.

 
' two-sample t-test ( 11/3/99 )
' (Johnston-DiNardo 11.2)
' group resample function (slow)
' checked 3/6/2007
 
' create workfile
wfcreate cardkrug u 1 40
 
' input data in matrix
' first 33 from N.J.
series x
x.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5
 
' create dummy for each subsample
series dum = 0
smpl 1 33
dum = 1
smpl @all
 
'----------------------------------------------------------------------------
' parametric t-test
'----------------------------------------------------------------------------
 
freeze(ttest) x.testby(mean) dum
show ttest
 
'----------------------------------------------------------------------------
' non-parametric permutation test
'----------------------------------------------------------------------------
 
' compute difference in means for actual sample
series mdiff = @mean(x,"@all if dum=1") - @mean(x,"@all if dum=0")
scalar dmean = @elem(mdiff,"1")
 
' set number of replications
!reps = 1000
 
' set seed of random number generator
rndseed 123456
 
' declare storage matrices
matrix pdata                             ' permuted data
vector(!reps) pdiff                      ' difference in means from pdata
 
' permutation loop
for !i=1 to !reps
   ' permute data
   x.resample(permute)  x_b
   ' compute difference in means
   mdiff = @mean(x_b,"@all if dum=1") - @mean(x_b,"@all if dum=0")
   ' store difference in means
   pdiff(!i) = @elem(mdiff,"1") 
next
 
' 95% interval (percentile method)
scalar lower = @quantile(pdiff,0.025) 
scalar upper = @quantile(pdiff,0.975)
 
' display output in table
table out
setcolwidth(out,1,30)
out(1,1) = "Sample difference in means:"
out(1,2) = dmean
 
out(2,1) = "Permutation confidence interval:"
out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]" 
 
out(3,1) = "Number of permutations:"
out(3,2) = @str(!reps)
show out
 
 

^Top